The Physics of Text: Ontological Realism in Information Extraction
نویسندگان
چکیده
We propose an approach to extracting information from text based on the hypothesis that text sometimes describes the world. The hypothesis is embodied in a generative probability model that describes (1) possible worlds and the facts they might contain, (2) how an author chooses facts to express, and (3) how those facts are expressed in text. Given text, information extraction is done by computing a posterior over the worlds that might have generated it. As a by-product, this unsupervised learning process discovers new relations and their textual expressions, extracts new facts, disambiguates instances of polysemous expressions, and resolves entity references. The probability model also explains and improves on Brin’s bootstrapping heuristic, which underlies many open information extraction systems. Preliminary results on a small corpus of New York Times text suggest that the approach is effective.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملScientific Realism and High Energy Physics
The paper discusses major implications of high energy physics for the scientific realism debate. The first part analyses the ways in which aspects of the empirically well-confirmed standard model of particle physics are relevant for a reassessment of entity realism, ontological realism and structural realism. The second part looks at the implications of more far-reaching concepts like string th...
متن کاملObserve the Split Between the Paths: from Persian Tadhkirah to magical realism: A discourse in the review of Mystical Realism by Mehrnaz Shirazi Adel
From Persian Tadhkirah To Magical Realism: A Discourse in The Review of Mystical Realism Mehrnaz Shirazi Adel /Ph.D. student of Persian Literature at the Institute of Humanities and Cultural Studies/ [email protected] Abstract Mystical Realism; A Comparison of Suffi Tadhkirah writing and Magical Realism with Emphasis on Marquez's Works by Mohammad Roodgar is in effect his doctoral thesi...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملThe Absence of ‘Paucity’ & ‘Momentariness’: Two New Components of Magical Realism in Günter Grass's The Tin Drum
This article presents the question whether it is correct to classify Günter Grass’s The Tin Drum as a work of magical realism. A brief scrutiny of the elements of magical realism, particularly Authorial Reticence and concept of Hesitation indicates that contrary to the advertisement of certain sources and publishers, this novel in certain circumstances, contradicts and opposes these two indispe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016